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Abstract 

Background: Although the crocodilepox virus (CRV) is currently unclassified, phylogenetic analyses suggest that its 
closest known relatives are molluscum contagiosum virus (MCV) and the avipox viruses. The CRV genome is 
approximately 190 kb and contains a large number of unique genes in addition to the set of conserved 
Chordopoxvirus genes found in all such viruses. Upon sequencing the viral genome, others noted that this virus 
was also unusual because of the lack of a series of common immuno-suppressive genes. However, the genome 
contains multiple genes of unknown function that are likely to function in reducing the anti-viral response of the 
host. 

Results: By using sensitive database searches for similarity, we observed that gene 157 of CRV-strain Zimbabwe 
(CRV-ZWE) encodes a protein with a domain that is predicted to bind dsRNA. Domain characterization supported 
this prediction, therefore, we tested the ability of the Robetta protein structure prediction server to model the 
amino acid sequence of this protein on a well-characterized RNA binding domain. The model generated by 
Robetta suggests that CRV-ZWE-157 does indeed contain an RNA binding domain; the model could be overlaid on 
the template protein structure with high confidence. 

Conclusion: We hypothesize that CRV-ZWE-157 encodes a novel poxvirus RNA binding protein and suggest that as 
a non-core gene it may play a role in host-range determination or function to dampen host anti-viral responses. 
Potential targets for this CRV protein include the host interferon response and miRNA pathways. 
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Background 

Crocodilepox virus (CRV) is an unclassified member of 
the Poxviridae family and its complete genome spans 
190,054 base pairs [1]. The genome of this species is 
approximately 61.1% G+C, similar to the genomes of 
molluscum contagiosum virus (MCV) and the ORF-like 
viruses. This relatively high G+C% distinguishes these 3 
groups of viruses from other poxviruses; MCV and ORF 
are in separate genera, and it is likely that CRV will also 
be placed into a separate genus when officially classified. 
It is not clear what has driven the genomes of these 3 
groups of viruses to become relatively GC-rich while 
other poxviruses have drifted towards a high A+T con- 
tent. It is important to note that although an overall A 
+T% is often used to characterize poxvirus genomes, the 
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individual genes in the viruses vary widely in nucleotide 
composition; for example CRV and vaccinia virus 
(VACV) genes range in A+T composition from 24-56% 
and 54-73%, respectively. 

The level of conservation between the ortholog sets 
from viruses in the various poxvirus genera varies 
greatly (25-50% aa identity), this reflects varying struc- 
tural and functional constraints on different proteins. 
This level of sequence conservation does not hinder 
identification and inclusion of CRV proteins into pox- 
virus ortholog sets, but it does create problems when 
the CRV proteins of unknown function are searched 
against databases using programs such as BLASTp [2] 
since the additional aa changes generated by the pres- 
sure to switch to G/C nucleotides reduces the percen- 
tage aa identity even though chemically similar aa may 
have been substituted. 

Poxviruses encode numerous proteins that block the 
host anti-viral response, including proteins that resist 
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the actions of both type I and type II interferons (IFN). 
These viruses not only encode soluble receptors that 
block IFN activity (orthologs of VACV-Cop B19, B8) 
and intracellular inhibitors of IFN activity (orthologs of 
VACV-Cop HI, E3, K3, and C7) but also interfere with 
host IFN signalling [3,4], VACV-Cop E3 is a dsRNA 
binding protein that blocks the action of the IFN-induci- 
ble dsRNA-dependent protein kinase (PKR). E3 binds 
both dsRNA and PKR to inhibit its activity as well as 
inhibiting other host molecules that are part of IFN sig- 
nalling [3]. 

CRV is interesting because of its diversity from other 
poxviruses, but it also creates significant problems for 
crocodile farming around the world. Therefore, identify- 
ing gene function is of interest in trying to understand 
how this virus controls host anti-viral responses and for 
potential vaccine design. Such work can also lead to a 
better understanding of the reptilian immune system. 
Therefore we have used a variety of bioinformatics tools 
to try to predict the function of the many CRV proteins 
for which the function is currently unknown. 

Results and Discussion 

Similarity searches for remote homologs of CPV unknown 
proteins 

65 CPV proteins annotated as unknown function in the 
Viral Orthologous Clusters database (VOCs) [5] were 
searched against the National Center for Biotechnology 
Information (NCBI) non-redundant protein database 
using position-specific iterated-BLAST (PSI-BLAST). 
The only significant hits returned by these searches 
were self-matches to CRV proteins or matches to other 
poxvirus orthologs. This was somewhat surprising 
because of the recent explosive growth in genome 
sequencing, but may reflect the paucity of sequencing of 
reptilian or related genomes. 

Next, these protein sequences were processed using 
the remote homology detection server HHPred [6,7] 
through the PDBalert interface [8], which is a more sen- 
sitive tool that uses 2° structure information and makes 
profile-to-profile searches against structural databases. 
Only 5 of the unknown CRV proteins generated signifi- 
cant (probability >80%, e-value < 0.05) hits in the PDBa- 
lert [8] scoring system. For 2 proteins (CRV-ZWE-016 
and CRV-ZWE-028) the matches were due to amino 
acid repeats in the protein sequences. These repeats did 
not match other protein repeats, but they generated 
areas rich in a few amino acids and these triggered 
matches to other proteins with regions in those particu- 
lar amino acids. One match (CRV-ZWE-076), to the 
family of flap endonucleases (VACV strain Copenhagen 
is G5R), had been discovered previously by using PDBa- 
lert [8] (but had not been included in the genome Gen- 
Bank file which is used to populate the VOCs database), 



as had the similarity between CRV-ZWE-078 and the 
NLPC/P60 superfamily of proteins [9]. The final signifi- 
cant match, and the focus of this paper, from the PDBa- 
lert searches was between CRV-ZWE-157 and a series 
of proteins containing dsRNA binding domains (DRBP). 

Characterization and Comparative Analysis of CRV-157 
and dsRNA binding domains 

Prediction of protein function from low similarity align- 
ments, within the so-called twilight zone (<20% iden- 
tity), is difficult and is best approached by gathering 
multiple lines of evidence that can build to form a solid 
conclusion. As determined above using PSI-BLAST, it 
appears that CRV-ZWE-157 is unique in the known 
poxviruses and is not an ortholog of the poxvirus PKR 
inhibitor (VACV-Copenhagen-E3L) that also contains a 
dsRNA-binding domain. The E3L ortholog of ORF 
virus, which also possesses a very GC-rich genome, is 
more than 22% identical to any of the poxvirus ortho- 
logs so it is unlikely that CRV-ZWE-157 simply fails to 
classify with this ortholog family. 

When CRV-ZWE-157 was used as input with the 
PDBalert tool, more than 15 protein hits were matched 
to a dsRNA-binding domain containing protein, each 
with a probability score of over 95% and with e-values 
less than 5.0 x 10" 2 . The significant feature of this result 
is that many of these proteins were distinct with relatively 
low identity scores showing that critical, well-conserved, 
amino acids within these dsRNA binding domains were 
responsible for the significant scores. Following this dis- 
covery, we revisited the PSI-BLAST search results and 
noted that although the E-values for the top two hits 
were very poor (>0.85), they were both proteins that con- 
tained dsRNA-binding domains (data not shown). 

The CRV-ZWE-157 primary sequence was also ana- 
lysed with InterProScan[10], which searches a protein 
against a variety of databases containing confirmed and 
predicted protein motifs/domains. Only the Gene3D and 
Superfamily tools were able detect a dsRNA binding-like 
domain in the early region of CRV-157. Ribonuclease III 
[11], ATP-dependent RNA helicase A [12], Interferon- 
inducible dsRNA-dependent protein kinase activator A 
[13], and Maternal effect protein staufen [14] all contain 
comparable dsRNA-binding domains. 

To appreciate the degree of conservation between the 
primary sequences of CRV-ZWE-157 and 3 distinct 
DRBPs from the PDBalert results, they were aligned 
using multiple sequence comparison by log-expectation 
(MUSCLE) [15] and manually refined using the Base- 
By-Base editor [16]. The alignment (Figure 1) revealed 
that the proteins are only similar over the dsRNA-bind- 
ing domain (amino acidsl-65 of CRV-ZWE-157) and 
that the percent identity between CRV-ZWE-157 and 
any of these DRBPs was very low (10-30%). However, 
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Figure 1 Multiple alignment of CRV-ZWE-157, RISC-loading complex subunit TRBP2 [PDB:3ADL] [29], DGCR8 [PDB:1X47] [45], and 
Staufen RNA binding homolog [PDB:1UHZ] [46]. Upper panel: Only the predicted dsRNA-binding region of CRV-ZWE-157 and the known 
dsRNA-binding regions of the other proteins are shown. Asterisks show amino acids crucial for dsRNA binding, dots indicate other perfectly 
conserved residues. Lower panel: Red bars represent a helices and green arrows represent (3 strands; these are aligned with the amino acid 
sequences above. The ksdssp algorithm was used to determine the secondary structures from the 3D structures or models. 



the small set of conserved amino acids are in fact con- 
served among all of the DRBPs including the key resi- 
dues required for RNA binding. Of the 5 amino acid 
residues critical to Staufen RNA binding [17], 4 are 
absolutely conserved in CRV-ZWE-157 and the other 
proteins (Figure 1). The RNA binding domain of these 
proteins happens to be at the N-terminus, but fre- 
quently they are located at other positions. The RNA 
binding domain of CRV-ZWE-157 is contained within 
the first 65 aa of the 127 aa protein. This also differenti- 
ates CRV-ZWE-157 from the poxvirus E3L orthologs 
that have a Z-DNA binding domain at the N-terminus 
and the dsRNA binding domain at the C-terminus of 
the protein [18]. 

To further assess the prediction that CRV-ZWE-157 
contains a dsRNA-binding domain, we used BindN+ 
[19], a tool that predicts nucleic acid binding regions 
from primary protein sequences. This algorithm also 
indicated that the N-terminal region of CRV-ZWE-157 
could bind RNA and calculated that the sites with the 
highest RNA binding affinities were located 1) at the 
end of the third p strand/start of the second a-helix and 
2) and the end of the first a-helix. This is in agreement 
with the later modeling experiment (see below). 

The secondary structure of almost all dsRNA-binding 
domains consists of two a helices that lie on a three- 
stranded anti-parallel beta sheet to produce a character- 
istic a-P-P-P-a fold [20]. Jpred [21] predicted this exact 
pattern in the N-terminus of CRV-ZWE-157 with minor 
variations in the lengths of the secondary structure units 
(data not shown). 

Structure Modeling of CRV-157 

Given the preceding support that the CRV-ZWE-157 
protein contains a RNA binding domain we submitted 



the protein sequence to the Robetta structural modeling 
server [22]. Robetta produced 5 models of CRV-157 (1- 
127). The 3D models for CRV-ZWE-157 developed 
from Robetta used the human TAR RNA-binding pro- 
tein (TRBP; PDB:2CPN) as a reference parent. TRBP is 
involved in RNA interference (RNAi) and affects the 
pathway through associations with the enzyme Dicer, 
which may be involved in the initiation process of RNAi 
[23]. Dicer requires the association of TRBP, which 
facilitates the recruitment of siRNA to Ago2 bound by 
Dicer and miRNA processing by Dicer [24]. DiGeorge 
syndrome critical region gene 8 (DGCR8), which takes 
part in the recognition of primary microRNA substrates 
prior to cleavage ([PDB:2YT4] [25]), was subsequently 
used for the creation of the full 3D models. However, 
this does not imply that any particular DRBP is signifi- 
cantly more similar to the CRV protein. The Ginzu 
module [26] of Robetta that predicts domain architec- 
ture of proteins also separated CRV-ZWE-157 into the 
DRBP-like N-terminal domain and the C-terminal (aa 
70-127) domain with a confidence of 2.27, which corre- 
sponds to a 96.5% probability in HHSearch [6]; Robetta 
found no hits for the C-terminal domain. I-TASSER 
[27,28], another protein structure and function predic- 
tion server, supported the choice of DGCR8 
([PDB:2YT4] [25]) as the best model for the N-terminal 
domain of CRV-ZWE-157. The results of the Robetta 
prediction are shown in Figure 2 with cartoon and 
space-filling formats. Both the overall structure and dis- 
tribution of surface charge on the predicted CRV-ZWE- 
157 structure are very similar to the known structures 
of the three DRBPs shown in Figure 3. 

The secondary structure of dsRNA-binding domains 
can be divided into three motifs [29]. The first motif is 
an a helix, the second motif consists of the three p 
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Figure 2 Predicted structure of CRV-ZWE-1 57. A) cartoon representation of Robetta CRV-ZWE-1 57 model; B-E) electrostatic surface 
representations of CRV-ZWE-1 57, TRBP2 [PDB3ADL] [29], DGCR8 [PDB:2YT4] [25] and dsRNA-specific editase [PDB:2B7V] [47], respectively. 
Positively and negatively charged surfaces are shown as blue and red, respectively. The electrostatic surface properties were calculated by APBS 
and the surface diagrams are positioned in the same orientation as (A). 




CD E 

Figure 3 Structure of RISC-loading complex subunit TRBP2 [PDB:3ADL] [29]bound to dsRNA. Protein and RNA are shown as blue and pink, 
respectively. The three lysine residues in the second alpha helix are indicated in orange with side chains displayed. 
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strands and the last is another a -helix. As shown in Fig- 
ure 1 (bottom panel) the secondary structure of the 
CRV-ZWE-157 model agrees very well with the actual 
secondary structure of known DRBP structures. Several 
structures of DRBPs bound with RNA have been solved; 
the first and second domains display a relatively neutral 
charge and interact with the dsRNA minor groove, 
while the third domain has a net positive charge that 
supports interactions with phosphate backbone of the 
major groove (Figure 3). Substitutions with arginine in 
this second a helix maintain strong binding to the 
major groove [30]. Generally, the first a-helix and the P 
sheet are less well conserved and are thought to act pri- 
marily as a scaffold for the second helix by binding to 
the RNA minor groove [31]. The DRBPs do not bind 
RNA through specific nucleotide sequences. This is 
thought to be due to the fact that the interactions with 
the minor grooves are water-mediated and allow greater 
flexibility [32]. 

To provide support of the similarity of the CRV-ZWE- 
157 model and the known structures we used Chimera 
[33] to superimpose the CRV and DGCR8 structures 
(Figure 4). These two structures generated a root mean 
square deviation (RMSD) of 1.67 A over 62 alpha 



carbon pairs whereas the RMSD for CRV and TRBP2 
was 1.76 A over 55 alpha carbon pairs. The RMSD for 
the superimposition of DGCR8 and TRBP2 DRBPs was 
only slightly better at 1.502 A over 63 alpha carbons. 

Genomic localization and Possible Origins of CRV-ZWE- 
157 

As discussed above, the CRV-ZWE-157 protein is not 
an ortholog of the VACV-Copenhagen E3L RNA-bind- 
ing protein. The CRV gene is located approximately 25 
kb from the right end of the genome in a region mostly 
populated by hypothetical genes of unknown function 
that are unique to CRV. Thus, there is little information 
to be gleaned from the location of this gene to indicate 
a possible origin. However, it is interesting that the 
CRV-ZWE-157 gene contains only 51.3% G+C, which is 
relatively low for the CRV genome, whereas the set of 
80 core chordopoxvirus CRV genes have a G+C range 
of 52-76% with a mean of 64% and the non-core CRV 
genes have a G+C range of 45-73% and a mean of 59%. 
Thus, this gene and flanking regions appear to be dis- 
tinct from the bulk of the genome and the compara- 
tively low G+C content of CRV-157 offers support for 
its acquisition from the host. There are also numerous 
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such regions scattered throughout the genome that may 
represent areas that have been acquired via horizontal 
transfer of DNA, most probably from a host [34]. 

Conclusions 

This bioinformatics study has gathered several lines of 
evidence to support the hypothesis that the CRV- Z WE- 
157 protein contains an N-terminal dsRNA-binding 
motif. HHpred searches matched a number of DRBPs to 
the CRV protein and Jpred predicted an a-P-P-P-a fold 
that is standard in all DRBPs. Perhaps the most conclu- 
sive evidence is the fact that Robetta successfully mod- 
eled the CRV-ZWE-157 amino acid sequence onto the 
structure of known DRBPs placing the few highly con- 
served amino acids in the correct location for interac- 
tions with dsRNA. The data supports the hypothesis 
that the CRV-ZWE-157 is unique in poxviruses and that 
this gene has been acquired from a CRV host since the 
virus evolved from the last known common ancestral 
virus. 

DsRNA-binding domains are not sequence specific 
and are often present in association with a variety of 
other functioning protein domains including those 
involved in mRNA localization [17], inhibition of inter- 
feron activity [35], Z-DNA binding as part of virus 
pathogenesis [36], and providing platforms for multi- 
protein complex assembly through WW domains 
[37,38]. Unfortunately, analysis of the C-terminal half of 
CRV-ZWE-157 gave no clue as to the possible function 
of this domain. However, given that this protein is 
unique among the poxviruses, it is clearly not one of the 
"essential poxvirus genes". Two possible scenarios are 
that it functions in a unique host-range capacity or as a 
novel virulence factor. For the latter, the dsRNA-binding 
domain suggests possible functions that include the 
blocking of interferon function, well known in pox- 
viruses, or a role in the processing of miRNAs. 

Methods 

All poxvirus genome, gene and protein sequences were 
obtained from the VOCs database [5]; other protein 
sequences were downloaded from the NCBI databases. 
Protein crystal structures models were retrieved from 
the Protein Data Bank (PDB) and referenced by its PDB 
ID in parentheses. Searches for remote homologs were 
initially conducted with PSI-BLAST [2] with subsequent 
searches using HHPred [6,7] via the PDBalert interface 
[8]. InterProScan [10] with its associated domain analy- 
sis tools was run at the European Bioinformatics Insti- 
tute web site. Multiple sequence alignments (MSA) of 
protein sequences were generated using MUSCLE [15], 
which has been incorporated into the Base-By-Base 
MSA editor [16]. Complete poxvirus genomes were 
visualized and analysed using the Viral Genome 



Organizer (VGO; [39]) tool at the Viral Bioinformatics 
Resource Centre (VBRC [40]). 

Prediction of the secondary structure of CRV-ZWE-157 
and other DRBPs was performed with the Jpred secondary 
structure prediction server [21] and BindN was used to 
calculate regions that were likely to bind RNA based on 
amino acid side chain pK a , hydrophobicity, and amino 
acid molecular mass [19]. The Robetta protein structure 
prediction server [22] was used to determine if the amino 
acid sequence of CRV-157 could be modeled onto the 
structure of known DRBPs with confirmation using I- 
TASSER [27,28]. Protein structures were displayed using 
Chimera [33] and JMol [41]. Secondary structure informa- 
tion used in the MSA was extracted from the 3D struc- 
tures using the Kabsch and Sander algorithm for defining 
the secondary structure of proteins (ksdssp) module [42] 
of Chimera. The electrostatic surface representations of 
the CRV-ZWE-157 protein structure and the DRBP tem- 
plate were produced by the Adaptive Poisson-Boltzmann 
Solver (APBS) plugin [43] in PyMOL [44] 
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